Detecting Japanese idioms with a linguistically rich dictionary
نویسندگان
چکیده
Detecting idioms in a sentence is important to sentence understanding. This paper discusses the linguistic knowledge for idiom detection. The challenges are that idioms can be ambiguous between literal and idiomatic meanings, and that they can be “transformed” when expressed in a sentence. However, there has been little research on Japanese idiom detection with its ambiguity and transformations taken into account. We propose a set of linguistic knowledge for idiom detection that is implemented in an idiom dictionary. We evaluated the linguistic knowledge by measuring the performance of an idiom detector that exploits the dictionary. As a result, more than 90% of the idioms are detected with 90% accuracy.
منابع مشابه
نقد و بررسی: فرهنگ اصطلاحات و عبارات رایج فارسی (فارسی – انگلیسی)
فرهنگ اصطلاحات و عبارات رایج فارسی (فارسی – انگلیسی)، تألیف استاد دکتر محمدرضا باطنی با دستیاری زهرا احمدینیا در سال 1392 توسط انتشارات فرهنگ معاصر در 1089 صفحه به چاپ رسیده است. در زبان انگلیسی فرهنگهای متعددی وجود دارند که مدخلهای آنها به شرح اصطلاحات و ترکیبات زبان اختصاص دارد. فرهنگ اصطلاحات کالینز کوبیلد[1]، فرهنگ اصطلاحات کمبریج[2] و فرهنگ اصطلاحات امریکن هریتیج[3]نمونههایی از اینگون...
متن کاملNTT DATA at TREC-7: System Approach for Ad-Hoc and Filtering
In TREC-7, we participated in the ad-hoc task (main task) and the ltering track (sub task). In the adhoc task, we adopted a scoring method that used co-occurrence term relations in a document and speci c processing in order to determine which conceptual parts of the documents should be targeted for query expansion. In ltering, we adopted a machine-readable dictionary for detecting idioms and an...
متن کاملMaking an XML-based Japanese-Slovene Learners' Dictionary
In this paper we present a hypertext dictionary of Japanese lexical units for Slovene students of Japanese at the Faculty of Arts of Ljubljana University. The dictionary is planned as a long-term project in which a simple dictionary is to be gradually enlarged and enhanced, taking into account the needs of the students. Initially, the dictionary was encoded in a tabular format, in a mixture of ...
متن کاملIdiomatic Expressions in VerbaLex
Idiomatic expressions are part of everyday language, therefore NLP applications that can “understand” idioms are desirable. The nature of idioms is somewhat heterogenous — idioms form classes differing in many aspects (e.g. syntactic structure, lexical and syntactic fixedness). Although dictionaries of idioms exist, they usually do not contain information about fixedness or frequency since they...
متن کاملA New Dictionary Construction Method in Sparse Representation Techniques for Target Detection in Hyperspectral Imagery
Hyperspectral data in Remote Sensing which have been gathered with efficient spectral resolution (about 10 nanometer) contain a plethora of spectral bands (roughly 200 bands). Since precious information about the spectral features of target materials can be extracted from these data, they have been used exclusively in hyperspectral target detection. One of the problem associated with the detect...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Language Resources and Evaluation
دوره 40 شماره
صفحات -
تاریخ انتشار 2006